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(54) MULTIMODE SPEECH ENCODER AND DECODER 



(57) Excitation information is coded in multimode 
using static and dynamic characteristics of quantized 
vocal tract parameters, and also at a decoder side, the 
postprocessing is performed in the multimode, thereby 



improving the qualities of unvoiced speech region and 
stationary noise region. 
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Description 



Technical Reld 

[0001] The present invention relates to a low-bit- s 
rate speech coding apparatus which peribrms coding 
on a speech signal to transmit, for example, in a mobile 
communication system, and more particularly, to a 
CELP (Code Excited Linear Prediction) type speech 
coding apparatus which separates the speech signal to io 
vocal tract information and excitation information to rep- 
resent. 

Background Art 

IS 

[0002] Used in the fields of digital mobile communi- 
cations and speech storage are speech coding appara- 
tuses which compress speech information to encode 
with high efficiency for utilization of radio signals and 
recording media. Among them, the system based on a so 
CELP (Code Excited Linear Prediction) system is car- 
ried into practice widely for the apparatuses operating at 
medium to low bit rates. The technology of the CELP is 
described in "code-excited . Linear Prediction 
(CELP):High-quality Speech at Very Low Bit Rates" by 2S 
M.R.Schroeder and B.S.Atal, Proc. ICASSP-85. 
24.1.1.. pp. 937-940. 1985. 

[0003] In the CELP type speech coding system, 
speech signals are divided into predetermined frame 
lengths (about 5ms to 50ms), linear prediction of the 30 
speech signals is performed for each frame, the predic- 
tion residual (excitation vector signal) obtained by the 
linear prediction for each frame is encoded using an 
adaptive code vector and random code vector com- 
prised of icnown waveforms. The adaptive code vector 3s 
and random code vector are selected for use respec- 
tively from an adaptive codebook storing previously 
generated excitation vectors and a random codebook 
storing the predetermined number of pre-prepared vec- 
tors with predetermined shapes. Used as the random 4o 
code vectors stored in the random cod^ook are. for 
example, random noise sequence vectors and vectors 
generated by arranging a few pulses at different posi- 
tions. 

[0004] The CELP coding apparatus performs the 45 
LPC synthesis and quantization, pitch search, random 
codefcx>ok search, and gain codebook search using 
input digital signals, and transmits the quantized LPC 
(L). pitch period (P). a random codebook index (S) and 
a gain codebook index (G) to a decoder. so 
[0005] However, the above-mentioned conventional 
speech coding apparatus needs to cope with voiced 
speeches, unvoiced speeches and background noises 
using a single type of random codetx)ok. and therefore 
it' is difficult to, encode all the input signals with high ss 
qiolity. ..... ; 



Disclosure of Invention 

[0006] An object of the present invention is to pro- 
vide a multimode speech coding apparatus and speech 
decoding apparatus capable of provkiing excitation cod- 
ing with multimode without newly transmitting mode 
Information, in particular, performing judgment of 
speech region/non-speech region in addition to judg- 
ment of voiced region/unvoiced region, and further 
increasing the improvement of coding/decoding per- 
formance performed with the multimode. 
[0007] In the present invention, the mode determi- 
nation is performed using static/dynamic characteristics 
of a quantized (parameter representing spectral charac- 
teristics, modes of various codebooks for use in coding 
excitation vectors are switched based on the mode 
determination indicating the speech region/non -speech 
region or voiced region/unvoiced region. Further, in the 
present invention, the modes of various codebooks for 
use in decoding are switched using the mode informa- 
tion used in the coding in decoding. 

Brief Descriptbn of Drawings 

[0008] 

F1G.1 is a block diagram illustrating a speech cod- 
ing apparatus in a first embodiment of the present 
invention; 

FIG.2 is a block diagram illustrating a speech 
decoding apparatus in a second embodiment of the 
present invention; 

FIG.3 is a flowchart for speech coding processing in 
the first embodiment of the present invention; 
FIG.4 is a flowchart for speech decoding process- 
ing in the second embodiment of the present inven- 
tion; 

F1G.5A is a block diagram illustrating a configura- 
tion of a speech signal transmission apparatus in a 
third enr£>odiment of the present invention; 
FIG.5B is a block diagram illustrating a configura- 
tion of a speech signal reception apparatus in the 
third embodiment of the present invention; 
FIG.6 is a t>lock diagram illustrating a configuration 
off a mode selector in a fourth embodiment of the 
present invention; 

FIG.7 is a block diagram illustrating a configuration 
off a multimode postprocessing section in a fiftii 
embodiment of the present invention; 
FIG.8 is a flowchart for the former part of multimode 
postprocessing in the fourth embodiment of the 
present invention; 

FtG.9 is a flowchart for the latter part of the multi- 
mode postprocessing in the fourth embodiment of 
the present invention; 

FIG. 10 is a ftawchart for the entire part of tiie multi- 
mode postprocessing in the fourth emtxxiiment of 
tiie present inv^on; ■ 
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FIG. 11 Is a flowchart for the former part of the mul- 
timode postprocessing In the fifth embodiment of 
the present invention; and 

FIG. 12 is a flowchart for the latter part of the multi- 
mode postprocessing in the fifth embodiment of the 5 
present invention. 

Best Mode for Carrying Out the Invention 

[0009] Speech coding apparatuses and others in io 
embodirhents of the present invention are explained 
below using FIG.1 to FIG.9. 



(First embodiment) 



15 



[001 0] FIG. 1 is a block diagram illustrating a config- 
uration of a speech coding apparatus according to the 
first embodiment of the present invention. 
[001 1 ] Input data, comprised of, for exarrple, digital 
speech signals, is input to preprocessing section 101. 20 
Preprocessing section 101 performs processing such 
as cutting of a direct current component and bandwidth 
limitation of the input data using a high-pass filter and 
band-pass filter to output to LPC analyzer 102 and 
adder 106. In addition, although it is possible to perform 25 
successive coding processing without performing any 
processing in preprocessing section 101. the coding 
performance is improved by performing the above-men- 
tioned processing. 

[001 2] LPC analyzer 1 02 performs linear prediction 30 
analysis, and calculates linear predictive coefficients 
(LPC) to output to LPC quantizer 103. 
[001 3] LPC quantizer 103 quantizes the input LPC, 
outputs the quantized LPC to synthesis filter 104 and 
mode selector 105. and further outputs a code L that 3S 
represents the quantized LPC to decoder. In addition, 
the quantization of LPC is performed usually after LPC 
is converted to LSP (Line Spectrum Pair) which has bet- 
ter interpolation characteristics. 

[001 4] As synthesis filter 1 04. a LPC synthesis filter 40 
is constructed using the quantized LPC input from LPC 
quantizer 103. With the constructed synthesis filter, fil- 
tering processing is performed on an excitation vector 
signal input from adder 114, and the resultant signal is 
output to adder 1 06. 45 
[0015] Mode selector 105 determines a mode of 
random codebook using the quantized LPC input from 
LPC quantizer 103. 

[0016] At this time, mode selector 105 stores previ- 
ously input information on quantized LPC. and performs so 
the selection of mode using both characteristics of an 
evolution of quantized LPC between frames and of the 
quantized LPC in a current frame. There are at least two 
types of the modes, of which exanples are a mode cor- 
responding to a voiced speech segment, and a mode 55 
corresponding to an unvoiced speech segment and sta- 
tionary noise segment. Further, as information for use in 
selecting a mode, it is not necessary to use the quan- 



tized LPC themselves, and it is more effective to use 
converted parameters such as the quantized LSP. 
reflective coeffidents and linear prediction residual 
power. 

[0017] Adder 106 calculates an error between the 
preprocessed input data input from preprocessing sec- 
tion 101 and the synthesized signal to output to percep- 
tual weighting fitter 107. 

[001 8] Perceptual weighting filter 1 07 performs per- 
ceptual weighting on the error calculated in adder 106 to 
output to error minimlzer 1 08. 

[001 9] Error minimizer 1 08 adjusts a random code- 
book index Si. adaptive codebook index (pitch period) 
Pi, and gain codebook index Gi respectively output to 
random codebook 109, adaptive codebook 110, and 
gain codebook 111, determines a random code vector, 
adaptive code vector, and random codekx>ok gain and 
adaptive codebook gain respectively to be generated in 
random codebook 109, adaptive codebook 110. and 
gain codebook 111 so as to minimize the perceptual 
weighted error input from perceptual weighting filter 
107. and outputs a code S representing the random 
code vector, a code P representing the adaptive code 
vector, and a code G representing gain information to 
decoder. 

[0020] Random codebook 109 stores the predeter- 
mined number of random code vectors with different 
shapes, and outputs the random code vector desig- 
nated by the index Si of random code vector input from 
error minimizer 108. Random codebook 109 has at 
least two types of modes. For example, random code- 
book 109 is configured to generate a pulse-like random 
code vector in the mode corresponding to a voiced 
speech segment and further generate a noise-like ran- 
dom code vector in the mode corresponding to an 
unvoiced speech segment and stationary noise seg- 
ment. The random code vector output from random 
codebook 1 09 is generated with a single mode selected 
in mode selector 105 from among at least two types of 
the modes described above, and multiplied by the ran- 
dom codebook gain Gs in multiplier 1 12 to be output to 
adder 114. 

[0021] Adaptive codebook 110 performs buffering 
while updating the previously generated excitation vec- 
tor signal sequentially, and generates the adaptive code 
vector using the adaptive codebook index (pitch period 
(pitch lag)) input from error minimizer 108. The adaptive 
code vector generated in adaptive codebook 1 10 is mul- 
tiplied by the adaptive codebook gain Ga in multiplier 
113. and then output to adder 1 1 4. 
[0022] Gain codebook 111 stores the predeter- 
mined number of sets of the adaptive codebook gain Ga 
and random codebook gain Gs (gain vector), and out- 
puts the adaptive codebook gain component Ga and 
random codebook gain component Gs of the gain vec- 
tor designated by the gain codebook index Gi input from 
error minimizer 108 respectively to multipliers 113 and 
1 12. In addition, if the gain codebook is constructed with 
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a plurality of stages, it is possible to reduce a memory 
amount required for the gain codebook and a computa- 
tion amount required for gain codebook search. Further, 
if the number of bits assigned for the gain codebook is 
sufficient, it is possible to scalar-quantize the adaptive § 
codebook gain eind random codebook gain independ- 
ently of each other. 

[0023] Adder 1 1 4 adds the random code vector and 
the adaptive code vector respectively input from multi- 
pliers 112 and 1 1 3 to generate the excitation vector sig- io 
nal, and outputs the generated excitation vector signal 
to synthesis filter 104 and adaptive codebook 110. 
[0024] In addition, in this embodiment, although 
only random codetx)ok 109 is provided with the multi- 
mode, it is possible to provide adaptive codebook 110 is 
and gain codebook 1 1 1 with the multinrKxJe. and thereby 
to inprove the quality. 

[0025] The flow of processing of speech coding 
method in the above-mentioned embodiment is next 
described with reference to FfG.3. This explanation 20 
describes the case that in the speech coding process- 
ing, the processing is performed for each unit process- 
ing with a predetermined time length (frame with the 
time length of a few tens msec), and further the 
processing is performed for each shorter unit process- 25 
ing (subframe) obtained by dividing a frame into the 
integer number of lengths. 

[0026] In step (hereinafter atibreviated as ST) 301. 
all the memories such as the contents of the adaptive 
codebook, synthesis filter memory and input buffer are 30 
cleared. 

[0027] Next, in ST302. input data such as a digital 
speech signal corresponding to a frame is input, and fil- 
ters such as a high-pass filter and band-pass filter are 
applied to the input data to perform offset cancellation 35 
and bandwidth limitation of the input data. The preproc- 
essed input data is buffered in an input buffer to be used 
for the following coding processing. 
[0028] Next, in ST303. the LPC (linear predictive 
coefficients) analysis is performed and LP (linear pre- 40 
dictive) coefficients are calculated. 
[0029] Next, in ST304. the quantization of the LP 
coefficients calculated in ST303 is performed. While 
various quantization methods of LPC are proposed, the 
quantization can be performed effectively by converting 45 
LPC into LSP parameters with good interpolation char- 
acteristics to apply the predictive quantization utilizing 
the multistage vector quantization and inter-frame cor- 
relation. Further, for exarrple in the case where a frame 
is divided into two subframes, it is general to quantize so 
the LPC of the second subframe. and determine the 
LPC of the first subframe by the interpolation process- 
ing using the quantized LPC of tiie second sut>frame of 
the last frame and tiie quantized LPC of the second 
subframe of the present frame. ss 
[0030] Next, in ST305. the perceptual weighting fil- 
ter that performs the perceptual weighting on the pre- 
processed input data is constructed. 



[0031] Next, in ST306, a perceptual weighted syn- 
thesis filter that generates a synthesized signal of a per- 
ceptual weighting domain from the excitation vector 
signal is constructed. This filter is comprised of tiie syn- 
thesis filter and perceptual weighting filter in a subordi- 
nation connection. The synthesis filter is constructed 
witii the quantized LPC quantized in ST304, and the 
perceptual weighting filter is constructed with the LPC 
calculated in ST303. 

[0032] Next, in ST307, the selection of mode is per- 
formed. The selection of mode is performed using static 
and dynamic characteristics of tiie quantized LPC quan- 
tized in ST304. Examples of specifically used character- 
istics are an evolution of quantized LSP, reflective 
coefficients calculated from the quantized LPC, and 
prediction residual power. Random codebook search is 
performed according to the mode selected in this step. 
There are at least two types of the modes to be selected 
in this step. An example considered is a two-mode 
structure of a voiced speech mode, and an unvoiced 
speech and stationary noise mode. 
[0033] Next, in ST 308. adaptive codebook search 
is performed. The adaptive codebook search is to 
search an adaptive code vector such that a perceptual 
weighted synthesized waveform is generated that is the 
closest to a waveform obtained by performing the per- 
ceptual weighting on tiie preprocessed input data. A 
position from which ttie adaptive code vector is fetched 
is determined so as to minimize an error between a sig- 
nal obtained by filtering the preprocessed input data 
with the perceptual weighting fitter constructed in 
ST305. and a signal obtained by fOtering the adaptive 
code vector fetched from tiie adaptive codebook as an 
excitation vector signal with the perceptual weighted 
synthesis filter constructed in ST306. 
[0034] Next, in ST309. tiie random codebook 
search is performed. The random codebook search is to 
select a random code vector to generate an excitation 
vector signal such that a perceptual weighted synthe- 
sized waveform is generated that is the closest to a 
waveform obtained by performing the perceptual 
weighting on the preprocessed input data. The search is 
performed in conskieration of that the excitation vector 
signal is generated by adding the adaptive code vector 
and random code vector. Accordingly, the excitation 
vector signal is generated by adding the adaptive code 
vector determined in ST308 and the random code vec- 
tor stored in the random codebook The random code 
vector is selected from the random code book so as to 
minimize an error between a signal obtained by filtering 
the generated excitation vector signal with the percep- 
tual weighted synthesis filter constructed in ST306, and 
the signal obtained by filtering the preprocessed input 
data witii tiie perceptual weighting fitter constructed in 
ST305. In addition, in tiie case where processing such 
as pitch period processing is performed on the random 
code vector, the search is performed also in considera- 
tion of such processing. Further tills random codebook 
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has at least two types of the modes. For example, the 
search is performed by using the random codebook 
storing pulse-like random code vectors in the mode cor- 
responding to the voiced speech segment, and using 
the random codebook storing noise-like random code 5 
vectors in the mode corresponding to the unvoiced 
speech segment and stationary noise segment. The 
random codebook of which mode is used in the search 
Is selected in ST307. 

[0035] Next. In ST310. gain codebook search is 10 
performed. The gain codebook search is to select from 
the gain codebook a pair of the adaptive codekXDok gain 
and random codebook gain respectively to be multiplied 
the adaptive code vector determined in ST308 and the 
random code vector determined in ST309. The excita- is 
tion vector signal Is generated by adding the adaptive 
code vector multiplied by the adaptive codebook gain 
and the random code vector multiplied by the random 
codebook gain. The pair of the adaptive codebook gain 
and random codebook gain is selected from the gain 20 
codebook so as to minimize an error between a signal 
obtained by filtering the generated excitation vector sig- 
nal with the perceptual weighted synthesis filter con- 
structed in ST306, and the signal obtained by filtering 
the preprocessed input data with the perceptual weight- 25 
Ing filter constructed in ST305. 

[0036] Next. In ST31 1 . the excitation vector signal is 
generated. The excitation vector signal is generated by 
adding a vector obtained by multiplying the adaptive 
code vector selected in ST308 by the adaptive code- 30 
book gain selected in ST310 and a vector obtained by 
multiplying the random code vector selected in ST309 
by the random Codebook gain selected in ST310. 
[0037] Next, in ST312, the update of the memory 
used in a loop of the subframe processing Is performed. 3S 
Examples specifically performed are the update of the 
adaptive codebook. and the update of states of the per- 
ceptual weighting filter and perceptual weighted synthe- 
sis filter. 

[0038] In ST305 to ST312, the processing is per- 40 
formed on a subframe-by-subframe basis. 
[0039] Next, in ST313, the update of memory used 
in a loop of the frame processing. Examples specifically 
performed are the update of states of the filter used in 
the preprocessing section, the update of quantized LPC 4S 
buffer (in the case where the inter-frame predictive 
quantization of LPC is performed), and the update of 
input data buffer. 

[0040] Next, in ST314, coded data is output. The 
coded data is output to a transmission path while being so 
subjected to bit stream processing and multiplexing 
processing corresponding to the form of the transmis- 
sion. 

[0041] In ST302 to 304 and ST313 to 314. the 
processing is performed on a frame-by-frame basts, ss 
Further the processing on a frame-by-frame basis and 
subframe-by-subframe is iterated until the input data is 
consumed. 



(Second embodiment) 

[0042] FIG. 2 is a block diagram illustrating a config- 
uration of a speech decoding apparatus according to 
the second embodiment of the present invention. 
[0043] The code L representing quantized LPC, 
code S representing a random code vector, code P rep- 
resenting an adaptive code vector, and code G repre- 
senting gain information, each transmitted from a coder, 
are respectively input to LPC decoder 201. random 
codebook 203, adaptive codebook 204 and gain code- 
book 205. 

[0044] LPC decoder 201 decodes the quantized 
LPC from the code L to output to mode selector 202 and 
synthesis filter 209. 

[0045] Mode selector 202 determines a mode for 
random codebook 203 and postprocessing section 211 
using the quantized LPC input from LPC decoder 201, 
and outputs mode information M to random codebook 

203 and postprocessing section 211 . In addition, mode 
selector 202 also stores previously input Information on 
quantized LPC. and performs the selection of mode 
using both characteristics of an evolution of quantized 
LPC between frames and of the quantized LPC in a cur- 
rent frame. There are at least two types of the modes, of 
which examples are a mode corresponding to a voiced 
speech segment, a mode corresponding to an unvoiced 
speech segment, and a mode corresponding to a sta- 
tionary noise segment. Further, as information for use in 
selecting a mode. It Is not necessary to use the quan- 
tized LPC themselves, and it is more effective to use 
converted parameters such as the quantized LSP. 
reflective coefficients and linear prediction residual 
power. 

[0046] Random codebook 203 stores the predeter- 
mined number of random code vectors with different 
shapes, and outputs a random code vector designated 
by the random codetxjok index obtained by decoding 
the input code S. This random codebook 203 has at 
least two types of the modes. For example, random 
codebook 203 is configured to generate a pulse-like 
random code vector in the mode corresponding to a 
voiced speech segment, and further generate a noise- 
like random code vector in the modes corresponding to 
an unvoiced speech segment and steady noise seg- 
ment. The random code vector output from random 
codebook 203 is generated with a single mode selected 
in mode selector 202 from among at least two types of 
the modes described above, and multiplied by the ran- 
dom codebook gain Gs in multiplier 208 to be output to 
adder 208. 

[0047] Adaptive codebook 204 performs buffering 
while updating the previously generated excitation vec- 
tor signal sequentially, and generates ain adaptive code 
vector using the adaptive codebook index (pitch period 
(pitch lag)) obtained by decoding the input code R The 
adaptive code vector generated in adaptive codebook 

204 Is multiplied by the adaptive codebook gain Ga In 
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multiplier 207. and then output to adder 208. 
[0048] Gain codebook 205 stores the predeter- 
mined number of sets of the adaptive codebook gain Ga 
and random codebook gain Gs (gain vector), and out- 
puts the adaptive codebook gain component Ga and s 
random coddDOOk gain component Gs of the gain vec- 
tor designate by the gain codebook index Gi obtained 
by decoding the input code G respectively to multipliers 
207 and 206. 

[0049] Adder 208 adds the random code vector and io 
the adaptive code vector respectively input from multi- 
pliers 206 and 207 to generate the excitation vector sig- 
nal, and outputs the generated excitation vector signal 
to synthesis filter 209 and adaptive codebook 204. 
[0050] As synthesis filter 209. a LPC synthesis filter is 
is constructed using the quantized LPC input from LPC 
decoder 201. With the constructed synthesis filter, the 
filtering processing is performed on the excitation vector 
signal input from adder 208, and the resultant signal is 
output to post fitter 210. 20 
[0051] Post filter 210 performs the processing to 
innprove subjective qualities of speech signals such as 
pitch emphasis, formant emphasis, spectral tilt compen- 
sation and gain adjustment on the synthesized signal 
irput from synthesis filter 209 to output to postprocess- 2S 
ing section 211. 

[0052] Postprocessing section 21 1 adaptively per- 
forms on the signal input from post filter 210 the 
processing to improve subjective qualities of tiie station- 
ary noise segment such as inter-frame smoothing 30 
processing of spectral amplitude and randomizing 
processing of spectral phase using tiie mode informa- 
tion M input from mode selector 202. For example, the 
smoothing processing and randomizing processing is 
rarefy performed in the nrxxles corresponding to the 35 
voiced speech segment and unvoiced speech segment, 
and such processing is adaptively performed in the 
mode corresponding to, for example, the stationary 
noise segment. The postprocessed signal is output as 
output data such as a digital decoded speech signal. 40 
[0053] In addition, although in this embodiment the 
mode inlbrmation M output from mode selector 202 is 
used in both the mode selectton for random codebook 
203 and mode selection for postprocessing section 21 1 . 
using the mode information M for either of the mode 45 
selections is also effective. In this case, tfie correspond- 
ing ertiier one performs tiie multimode processing. 
[0054] The flow off the processing of tiie speech 
decoding metiiod in the above-mentioned embodiment 
is next descrfoed with reference to FIG.4. This explana- so 
tion describes the case tfiat in the speech coding 
processing, the processing is performed for each unit 
processing with a predetermined time length (frame 
witii the time length of a few tens msec), and further the 
processing is performed for each shorter unit process- ss 
ing (subframe) obtained by dividing the frame into the 
integer number of lengths; ■■ ' 
[0055] In ST401 , all the memories such as the con- 



tents of the adaptive codebooK synthesis filter memory 
and output buffer are cleared. 

[0056] Next, in ST402. coded data is decoded. Spe- 
cifically, multiplexed received signals are demultiplexed, 
and the received signals constructed in bitstreams are 
converted into codes respectively representing quan- 
tized LPC, adaptive code vector, random code vector 
and gain information. 

[0057] Next in ST403. the LPC are decoded. The 
LPC are decoded from the code representing the quan- 
tized LPC obtained in ST402 with tiie reverse procedure 
of tiie quantization of the LPC described in tiie first 
embodiment. 

[0058] Next, in ST404, the synthesis filter is con- 
structed with the LPC decoded in ST403. 
[0059] Next, in ST405. the mode selection for tiie 
random codebook and postprocessing is performed 
using the static and dynamic characteristics of the LPC 
decoded in ST403. Examples of specifically used char- 
acteristics are an evolution of quantized LSP, reflective 
coefficients calculated from the quantized LPC, and 
prediction reskdual power. The decoding of the random 
code vector and postprocessing is performed according 
to the mode selected in this step. There are at least two 
types of the modes, which are, for exanrple, comprised 
of a mode corresponding to a voiced speech segment, 
mode corresponding to an unvoiced speech segment 
and mode corresponding to a stationary noise segment. 
[0060] Next, in ST406, the adaptive code vector is 
decoded. The adaptive code vector is decoded by 
decoding a position from which the adaptive code vector 
is fetched from the adaptive codebook using the code 
representing the adaptive code vector, and fetching the 
adaptive code vector from the obtained position. 
[0061] Next, in ST407. the random code vector is 
decoded. The random code vector is decoded by 
decoding tiie random codebook index from the code 
representing the random code vector, and retrieving tiie 
random code vector corresponding to the obtained 
index from the random codebook. When other process- 
ing such as pitch period processing of the random code 
vector is applied, a decoded random code vector is 
obtained after further being subjected to the pitch period 
processing. This random codebook has at least two 
types of the modes. For example, this random code 
book is configured to generate a pulse-like random 
code vector in the mode corresponding to a voiced 
speech segment, and further generate a noise-like ran- 
dom code vector in the modes corresponding to an 
unvoiced speech segment and stationary noise seg- 
ment. 

[0062] Next, in ST408. the adaptive codetxiok gain 
and random codebook gain are decoded. The gain 
information is decoded by decoding tiie gain codebook 
index from the code representing the gain infomiation. 
and retrieving a p)air of the adaptive codebook gain and . 
random codebook gain instructed with the obtained 
irxiex from the gain codebook. 



6 



11 



EP 1 024 477 A1 



12 



[0063] Next, in ST409, the excitation vector signal is 
generated. The excitation vector signal is generated by 
adding a vector obtained by multiplying the adaptive 
code vector selected in ST406 by the adaptive code- 
book gain selected in ST408 and a vector obtained by 5 
multiplying the random code vector selected in ST407 
by the random codebook gain selected in ST408. 
[0064] Next, in ST410. a decoded signal is synthe- 
sized. The excitation vector signal generated in ST409 
is filtered with the synthesis filter constructed in ST404. 10 
and thereby the decoded signal is synthesized. 
[0065] Next, in ST41 1 , the postfiltering processing 
Is performed on the decoded signal. The postfiltering 
processing is comprised of the processing to improve 
subjective qualities of decoded signals, in particular, is 
decoded speech signals, such as pitch emphasis 
processing, formant emphasis processing, spectral tilt 
compensation processing and gain adjustment 
processing. 

[0066] Next, in ST412, the final postprocessing is 20 
performed on the decoded signal subjected to postfilter- 
ing processing. The postprocessing is comprised of the 
processing to improve subjective qualities of stationary 
noise segment in the decoded signal such as inter- 
(sub)frame smoothing processing of spectral amplitude 25 
and randomizing processing.of spectral phase, and the 
processing corresponding to mode selected in ST405 is 
performed. For example, the smoothing processing and 
randomizing processing is- rarely performed in the 
modes corresponding to the voiced speech segment 30 
and unvoiced speech segment, and such processing is 
performed In the mode corresponding to the stationary 
noise segment. The signal generated in this step 
becomes output data. 

[0067] Next, in ST413, the update of the memory 3S 
used in a loop of the subframe processing is performed. 
Specifically performed are the update of the adaptive 
codebook, and the update of states of filters used in the 
postfiltering processing. 

[0068] In ST404 to ST413. the processing Is per- 40 
formed on a subframe-by-subframe basis. 
[0069] Next, in ST414. the update of memory used 
in a loop of the frame processing is performed. Specifi- 
cally performed are the update of quantized (decoded) 
LPC buffer (in the case where the inter-frame predictive 45 
quantization of LPC is performed), and update of output 
data buffer. 

[0070] In ST402 to 403 and ST414. the processing 
is performed on a frame-by-frame basis. Further, the 
processing on a frame-by-frame basis is iterated until so 
the coded data is consumed. 

(Third emkxxJiment) 

[0071 ] FIG.5 is a block diagram illustrating a speech ss 
signal transmission apparatus and reception apparatus . 
respectively provided with the speech coding apparatus 
of the first embodiment 1 and speech decoding appara- 



tus of the second embodiment 2. FIG.5A illustrates the 
transmission apparatus, and FIG.5B illustrates the 
reception apparatus. 

[0072] In the speech signal transmission apparatus 
in FIG.5A, speech input apparatus 501 converts a 
speech into an electric analog signal to output to A/D 
converter 501 . A/D converter 502 converts the analog 
speech signal into a digital speech signal to output to 
speech coder 503. Speech coder 503 performs speech 
coding processing on the input signal, and outputs 
coded information to RF modulator 504. R/F modulator 
54 performs modulation, amplification and code spread- 
ing on the coded speech signal information to transmit 
as a radio signal, and outputs the resultant signal to 
transmission antenna 505. Rnally, the radio signal (RF 
signal) 506 is transmitted from transmission antenna 
505. 

[0073] On the other hand, the reception apparatus 
in FIG.5b receives the radio sjgnal (RF signal) 506 with 
reception antenna 507. and outputs the received signal 
to RF demodulator 508. RF demodulator 508 performs 
the processing such as code despreading and demodu- 
lation to convert the radio signal into coded information, 
and outputs the coded information to speech decoder 
509. Speech decoder 509 performs decoding process- 
ing on the coded information and outputs a digital 
decoded speech signal to D/A converter 510. D/A con- 
verter 510 converts the digital decoded speech signal 
output from speech decoder 509 into an analog 
decoded speech signal to output to speech output 
apparatus 511. Finally, speech output apparatus 511 
converts the electric analog decoded speech signal into 
a decoded speech to output. 

[0074] It is possible to use the above-mentioned 
transmission apparatus and reception apparatus as a 
mobile station apparatus and base station apparatus in 
mobile communication apparatuses such as portable 
telephones. In addition, the medium that transmits the 
information is not limited to the radio signal descrit>ed in 
this embodiment and it may be possible to use optosig- 
nals. and further posslt>le to use cable transmission 
paths. 

[0075] Further, it may be possible to achieve the 
speech coding apparatus described in the first embodi- 
ment, the speech decoding apparatus described in the 
second embodiment, and the transmission apparatus 
and reception apparatus described in the third embodi- 
ment by recording the corresponding program in a 
recording medium such as a magnetic disk, optomag- 
netic disk and ROM cartridge to use as software. The 
use of thus obtained recording medium enables a per- 
sonal computer using such a recoiding medium to 
achieve the speech coding/decoding apparatus and 
transmission/reception apparatus. 

(Fourth emkKXiiment) 

[0076] The fourth embodiment descries examples 
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of configurations of mode selectors 105 and 202 in the 
above-mentioned first and second embodiments. 
[0077] FIG.6 illustrates a mode selector according 
to the fourth embodiment. 

[00781 The mode selector according this embodi- s 
ment is provided with dynamic characteristic extraction 
section 601 that extracts the dynamic characteristic of 
quantized LSP parameters, and first and second static 
characteristic extraction sections 602 and 603 that 
extract the static characteristic of quantized LSP paranv io 
eters. 

[0079] Dynamic characteristic extraction section 
601 receives an input quantized LSP parameter in AR 
type smoothing section 604 to perform smoothing 
processing. AR type smoothing section 604 performs is 
the smoothing processing expressed with the following 
equation (1) on each order quantized LSP parameter, 
that is Input for each unit processing time, as time 
sequence data: 

20 

Ls(i>(1-a)><U[0+axL[iJ,i=1,2 M.0<a<1 (1) 

Ls[i]: ith order smoothed quantized LSP parameter 
Lp]: ith order quantized LSP parameter 
a : smoothing ooeffident 25 

M : LSP analysis order 

[0080] In addition, in the equation (1), the value of a 
is set at about 0.7 to avoid too strong smoothing. The 
smoothed quantized parameter obtained with the above 30 
equation (1) is branched to be input to adder 606 
through delay section 605 and to be directly input to 
adder 606. 

[0081 ] Delay section 605 delays the input smoothed 
quantized parameter by a unit processing time to output 35 
to adder 606. 

[0082] Adder 606 receives the smoothed quantized 
LSP parameter at the current unit processing time, and 
the smoothed quantized LSP parameter at the last unit 
processing time. Adder 606 calculates an evolution 40 
between the smoothed quantized LSP parameter at the 
cun-ent unit processing time, and the smoothed quan- 
tized LSP parameter at the last unit processing time. 
The evolution is output for each order of LSP parameter. 
The result calculated by adder 606 is output to square 4S 
sum calculation section 607. 

[0083] Square sum calculation section 607 calcu- 
lates the square sum of the evolution for each order 
between the smoothed quantized LSP parameter at the 
current unit processing time, and the smoothed quan- so 
tized LSP parameter at the last unit processing time. 
[0084] Dynamic characteristic extraction section 
601 receves the quantized LSP parameter in delay sec- 
tion 608 in parallel with AR smoothing section 604. 
Delay section 608 delays the input quantized LSP ss 
F)arameter by a unit processing time to output to . AR 
type average calculation section 611 through switch 
609. 



[0085] Switch 609 is connected when the mode 
information output from delay section 610 is the noise 
mode to operate to input the quantized LSP parameter 
output from delay section 608 to AR type average calcu- 
lation section 611. 

[0086] Delay section 610 receives the mode infor- 
mation output from mode determination section 621. 
and delays the input mode information by a unit 
processing time to output to switch 609. 
[0087] AR type average calculation section 61 1 cal- 
culates the average LSP parameter over the noise 
region based on the equation (1) in the same way as AR 
type smoothing section 604 to output to adder 612, In 
addition, the value of a in the equation (1) is set at about 
0.05 to perform extremely high smoothing processing, 
and thereby tiie long-time average of LSP parameter is 
calculated. 

[0088] Adder 612 calculates an evolution for each 
order between the quantized LSP parameter at the cur- 
rent unit processing time, and the average quantized 
LSP parameter in the noise region calculated by AR 
type average calculation section 61 1 . 
[0089] Square sum calculation section 61 3 receives 
the difference information of quantized LSP parameters 
output from adder 612. and calculates the square sum 
for each order to output to speech region detection sec- 
tion 619. 

[0090] Dynamic characteristic extraction 601 for 
quantized LSP parameter is comprised of components 
604 to 61 3 as described above. 
[0091] First static characteristic extraction section 
602 calculates linear prediction residual power from the 
quantized LSP parameter in linear prediction residual 
power calculation section 614, and further calculates a 
regfon between neighlxjring orders of the quantized 
LSP parameters as expressed in the following equation 
(2) in neighboring LSP region calculation section 615: 

Ld[i]=L(i+1]-L[i].i=1.2 M-1 (2) 

Mi]: Hh order quantized LSP parameter 

[0092] The value calculated in neighboring LSP 
region calculation section 615 is provided to variance 
calculation section 616. Variance calculation section 
616 calculates the variance of quantized LSP parame- 
ter regions output from neighboring LSP region calcula- 
tion section 615. At the time the variance is calculated, 
it is possible to reflect characteristics of peak and valley 
except tiie peak at the lowest frequency, by eliminating 
the data of the lowest frequency (Ld [1]) without using 
an the data of LSP parameter regions. Witii respect to a 
stationary noise with the characteristic such that levels 
at a low frequency barxl are lifted, when such a noise is 
passed through the high-pass filter, since a peak of the 
spectrum always appears around the cut-off frequency 
of the filter, it is effective to cancel the information of 
such a peak of the spectrum. In other words, it is possi- 
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ble to extract the characteristics of peak and valley of 
the spectral envelop of an input signal, and therefore to 
extract the static characteristics to detect a region with 
high possibility that the region is a speech region. Fur- 
ther, according to this constitution, it is possible to sep- 5 
arate the speech region and stationary noise region 
with high accuracy 

[0093] First static characteristic extraction section 
602 for quantized LSP parameter is comprised of com- 
ponents 61 4. 61 5 and 61 6 as described above. 10 
[0094] In second static characteristic extraction 
section 603. reflective coefficient calculation section 
617 converts the quantized LSP parameter into a reflec- 
tive coefficient to output to voiced/unvoiced judgment 
section 620. Concurrently with the above processing, is 
linear prediction residual power calculation section 618 
calculates the linear prediction residual power from the 
quantized LSP parameter to output to voiced/unvoiced 
judgment section 620. 

[0095] In addition, since linear prediction residual 20 
power calculation section 618 is the same as linear pre- 
diction residual power calculation section 614, it is pos- 
sible to share one component as the sections 614 and 
618. 

[0096] Second static characteristic extraction sec- 25 
Xion 603 for quantized LSP parameter is comprised of 
components 61 7 and 61 8 as described above. 
[0097] Outputs from dynamic characteristic extrac- 
tion section 601 and first static characteristic extraction 
section 602 are provided to speech region detection 30 
section 619. Speech region detection section 619 
receives an evolution amount of the smoothed quan- 
tized LSP parameter input from square sum calculation 
section 607, a distance between the average quantized 
LSP parameter of the noise segment and the current 35 
quantized LSP parameter input from square sum calcu- 
lation section 613. the quantized linear prediction resid- 
ual power input from linear prediction residual power 
calculation section 614, and the variance information of 
the neighboring LSP region data input from variance 40 
calculation section 616. Then, using these information, 
speech region detection section 619 judges whether or 
not an input signal (or a decoded signal) at the current 
unit processing time is a speech region, and outputs the 
judged result to mode determination section 621. The 45 
more specific method for judging whether the input sig- 
nal is a speech region is descried later using FIG.8. 
[0098] On the other hand, an output from second 
characteristic extraction section 603 is provided 
to voiced/unvoiced judgment section 620. so 
Voiced/unvoiced judgment section 620 receives the 
reflective coefficient input from reflective coefficient cal- 
culation section 61 7. and the quantized linear prediction 
residual power input from linear prediction residual 
power calculation section 618. Then, using these infor- ss 
mation, voiced/unvoiced judgment section 620 judges 
whether the input signal (decoded signal) at the current 
unit processing time is a voiced region or unvoiced 



region, and outputs the judged result to mode determi- 
nation section 621. The more specific voiced/unvoiced 
judgment method is descried later using FIG.9. 
[0099] Mode determination section 621 receives 
the judged result output from speech region detection 
section 619 and the judged result output from 
vdced/unvoioed judgment section 620. and using these 
information, determines a mode of the input signal (or 
decoded signal) at the current unit processing time to 
output. The more specific mode classifying method is 
described later using F1G.10. 

[0100] In addition, although AR type sections are 
used as the smoothing section and average calculation 
section in this emtx)diment. it may be possible to per- 
form the smoothing and average calculation by using 
other methods. 

[0101] . The detail of the speech region judgment 
method in the above-mentioned embodiment is next 
explained with reference to FIG.8. 
[0102] First in ST801. the first dynamic parameter 
(Paral) is calculated. The specific contents of the first 
dynamic parameter is an evolution amount of quantized 
LSP parameter for each unit processing time, and 
expressed with the following equation (3): 



M 



D(t)^'£{LSKtyLSI{t-^))' 



(3) 



LSi(t): 



snrKX)thed quantized LSP at time t 



[0103] Next, in ST802, it is checked whether or not 
the first dynamic parameter is larger than a predeter- 
mined threshold Thi . When the parameter exceeds the 
threshold Thi, since the evolution amount of the quan- 
tized LSP parameter is large, it is judged that the input 
signal is a speech region. On the other hand, when the 
parameter is equal to or less than the threshold Th1 . 
since the evolution amount of the quantized LSP param- 
eter is small, the processing proceeds to ST803, and 
further proceeds to steps for judgment processing with 
other parameter. 

[01 04] In ST802. when the first dynamic parameter 
is equal to or less than the threshold Th1 , the process- 
ing proceeds to ST803, where the number of a counter 
indicative of the number of times the stationary noise 
region is judged previously The initial value of the coun- 
ter is 0. and is incremented by 1 for each unit processing 
time judged as the stationary noise region with the 
mode determination method. In ST803, when the 
number of the counter equals to or less than a predeter- 
mined threshold ThC. the piocessing proceeds to 
ST804. where it is judged whether or not the input signal 
is a speech region using the static parameter. On the 
other hand, when the numt)er of the counter exceeds 
the threshold ThC. the processing proceeds to ST806, 
where it is judged whether or not the input signal is a 
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speech region using the second dynamic parameter. 
[0105] Two types of parameters are calculated in 
ST804. One is the linear prediction residual power 
(Para3) calculated from the quantized LSP parameters, 
and the other is the variance of the difference informa- 
tion of neighboring orders of quantized LSP parameters 
(Para4). The linear prediction residual power is olrtained 
by converting the quantized LSP parameters into the lin- 
ear predictive coefficients and using the relation equa- 
tion in the algorithm of Levinson-Durbtn. It is known that 
the linear prediction residual power tends to be higher at 
an unvoiced segment than at a voiced segment, and 
therefore the linear prediction residual power is used as 
a criterion of the voiced/unvoiced judgment. The differ- 
ence information of neighboring orders of quantized 
LSP parameters is expressed with the equation (2). and 
the variance of such data is obtained. However there 
are some cases, which are depending on the types of 
noises and bandwidth limitation, of existing the spectral 
peak at the lowest frequency band. Therefore it is pref- 
erable to obtain the variance using the data from i=2 to 
M-1 (M is analysis order) in the equation (2) without 
using the difference information of neighboring orders at 
the low frequency edge (i=1 in the equation (2)). In the 
speech signal, since there are about three fbrmants at a 
telephone band (200Hz to 3.4 kHz), the LSP regions 
have wide portions and narrow portions, and therefore 
the variance of the region data tends to be increased. 
On the other hand, in the stationary noise, since there is 
no fbrmant structure, the LSP regions usually have rela- 
tively equal regions, and therefore such a variation 
tends to be decreased. By the use of these characteris- 
tics, it is possikDie to judge whether or not the input signal 
is a speech region. However, there is the case that 
some type of noise has the spectral peak at a low fre- 
quency band as described previously. In this case, the 
LSP region at the lowest frequency band becomes nar- 
row, and therefore the variance obtained by using all the 
neighboring LSP evolution data decreases the differ- 
ence caused by the presence or absence of the fbrmant 
structure, thereby lowering the judgment accuracy 
Accordingly. ot)taining the variance with the neight»oring 
LSP difference information at the low frequency edge 
eliminated prevents such deterioration of tiie accuracy 
However, since such a static parameter has lower judg- 
ment ability than the dynamic parameter, it is preferable 
to use the static parameter as supplementary informa- 
tion. Two types of parameters calculated in ST804 are 
used in ST805. 

[01 061 Next, in ST805, two types of parameters cal- 
culated in ST804 are processed with a threshold. Spe- 
cifically, in the case where the linear prediction residual 
power (ParaS) is equal to or less than a tiireshold Th3, 
and the variance (Para4) of neighboring LSP region 
data is equal to or more than a threshold Th4, it is 
judged that the input signal is a speech region. In other 
cases, it is judged that tite input signal is a stationary 
noise region (non-speech region)'. When the stationary 
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noise region is judged, the value of the counter is incre- 
mented by 1. 

[0107] In ST806. tiie second dynamic parameter 
(Para2) is calculated. The second dynamic parameter is 

5 a parameter irKficative of a similarity degree between 
the average quantized LSP parameter in a previous sta- 
tionary noise region and tiie quantized LSP parameter 
in the current unit processing time, and specifically, as 
expressed in tiie equation (4), is obtained as the square 

10 sum of different values obtained for each order using 
the above-mentioned two types of quantized LSP 
parameters: 

M 

E(0=2(^'(0-^^0^ (4) 



Li(t) : quantized LSP at time t 
20 LAi: average quantized LSP of a noise region 

The obtained second dynamic parameter is processed 
vwth the threshold in ST807. 

[0108] Next, in ST807. it is determined whether or 
25 not the second dynamic parameter exceeds the thresh- 
old Th2. When the second dynamic parameter exceeds 
the threshold Th2. since the similarity degree to the 
average quantized LSP parameter in tiie previous sta- 
tionary noise region is low, it is judged tfiat the input sig- 
30 nal is the speech region. When the second dynamic 
parameter is equal to or less than the threshold Th2. 
since the similarity degree to the average quantized 
LSP parameter in the previous stationary noise region is 
high, it is judged that the input signal is the stationary 
35 noise region. The value off the counter is incremented by 
1 when tiie input signal is judged as the stationary noise 
region. 

[0109] The detail of tiie voiced/unvoiced region 
judgment method in the above-mentioned enlbodiment 

40 is next explained with reference to FIG. 9. 

[0110] First, in ST901. first-order reflective coeffi- 
cient is calculated from the quantized LSP parameter in 
the current unit processing time. The reflective coeffi- 
cient is calculated after the LSP parameter is converted 

45 into the linear predictive coefficient. 

[0111] Next, in ST902, it is determined whether or 
not the above-mentioned reflective coefficient exceeds 
the first threshold Th1. When the coefficient exceeds 
the threshold Thi, it is judged that the current unit 

so processing time is the unvoiced region, and the 
voiced/unvoiced judgment processing is finished. When 
the coefficient is equal to or less tiian the tiireshold Thi , 
the voiced/unvoiced judgment processing is further con- 
tinu«J. 

55 [0112] When tiie region is not judged as the 
unvoiced region in ST902. in ST903. it is determined 
whetiier or not the above-mentioned reflective coeffi- 
cient exceeds the second threshold Th2. When tiie 
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coefficient exceeds the threshold Th2, the processing 
proceeds to ST905. and when the coefficient is equal to 
or less than the threshold Th2. the processing proceeds 
to ST904. 

[01 1 3] When the above-mentioned reflective coeff i- 5 
cient is equal or less than the second threshold Th2 in 
ST903, in ST904. it is determined whether or not the 
above-mentioned reflective coefficient exceeds the third 
threshold Th3. When the coefficient exceeds the thresh- 
old Th3, the processing proceeds to ST907. and when io 
the coefficient is equal to or less than the threshold Th3. 
the region is judged as the speech region, and the 
voiced/unvoiced judgment processing is finished. 
[01 1 4] When the above-mentioned reflective coeffi- 
cient exceeds the second threshold Th2 in ST903. the is 
linear prediction residual power is calculated in ST905. 
The linear prediction residual power is calculated after 
the quantized LSP is converted into the linear predictive 
coefficient. 

[0115] In ST906, following ST905, it is determined 20 
whether or not the above-mentioned linear prediction 
residual power exceeds the threshold Th4. When the 
power exceeds the threshold Th4, it is judged that the 
region is the unvoiced region, and the voiced/unvoiced 
judgment processing is finished. When the power is 25 
equal to or less than the threshold Th4, it is judged that 
the region is the speech regioa and the 
voiced/unvoiced judgment processing is finished. 
[01 1 6] When the above-mentioned reflective coeffi- 
cient exceeds the third threshold ThS in ST904, the lin- 30 
ear prediction residual power is calculated in ST907. 
[0117] In ST908. following ST907, it is determined 
whether or not the above-mentioned linear prediction 
residual power exceeds the threshold ThS. When the 
power exceeds the threshold ThS, it is judged that the 3S 
region is the unvoiced region, and the voiced/unvoiced 
judgement processing is finished. When the power is 
equal to or less than the threshold ThS, it is judged that 
the region is the speech region, and the 
voiced/unvoiced judgment processing is finished. 40 
[0118] The mode determination method used in 
mode determination section 621 is next explained with 
reference to FIG. 10. 

[0119] First, in ST1001, the speech region detec- 
tion result is input. This step may be a block itself that 45 
performs the ^eech region detection processing. 
[01 20] Next, in ST1 002. it is determined whether to 
determine that a mode is the stationary noise mode, 
based on the judgment result on whether or not the 
region is the speech region. When the region is the so 
speech region, the processing proceeds to ST1003. 
When the region is not the speech region (stationary 
noise region), the mode determination result indicative 
of the stationary noise mode is output, and the mode 
determination processing is finished. 55 
[0121] When it is determined that the region is not 
the stationary noise mode in ST1002. the 
voicedAjnvoiced judgment result is input in ST1003. 



This step may be a block itself tiiat performs the 
voiced/unvoiced determination processing. 
[0122] Following ST1003, the mode determination 
is performed to determine whether the mode is the 
voiced region mode or the unvoiced region mode based 
on the voiced/unvoiced judgment result. When the judg- 
ment result is indicative of the voiced region, the mode 
determination result indicative of the voiced region 
mode is output, and the rnode determination processing 
is finished. When tiie voiced/unvoiced judgment result is 
indicative of the unvoiced region, the mode determina- 
tion result indicative of the unvoiced region mode is out- 
put, and the mode determination processing is finished. 
As described above, using the speech region detection 
result and voiced/unvoiced judgment, the modes of the 
input signals (or decoded signals) in a current unit 
processing block are classified into three modes. 

(Fifth embodiment) 

[01 23] FIG.7 is a block diagram illustrating a config- 
uration of a postprocessing section according to the fifth 
ennbodiment of the present invention. The postprocess- 
ing section is used in the speech signal decoding appa- 
ratus described in the second embodiment with the 
mode selector, described in the fourth embodiment, 
combined therewith. The postprocessing section illus- 
trated in FtG.7 is provided with mode selection switches 
70S. 708. 707 and 711, spectral amplitude smootfiing 
section 708, specfaBi phase randomizing sections 709 
and 710, and threshold setting sections 703 and 716. 
[0124] Weighted synthesis filter 701 receives 
decoded LPC output from LPC decoder 201 in the pre- 
viously described speech decoding apparatus to con- 
struct the perceptual weigfited synthesis filter, performs 
weighted filtering processing on the synthesized speech 
signal output from synthesis filter 209 or post filter 210 
in the speech decoding apparatus to output to FFT 
processing section 702. 

[0125] FFT processing , section 702 performs Ff=T 
processing on the weighting-processed decoded signal 
output from weighted synthesis filter 701 . and outputs a 
spectral amplitude WSAi to first threshold setting sec- 
tion 703, first spectral amplitude smoothing section 706 
and first spectral phase randomizing section 709. 
[01 26] First threshold setting section 703 calculates 
the average of the spectral amplitude calculated in FFT 
processing section 702 using all frequency signal com- 
ponents, and using the calculated average as a refer- 
ence, outputs the threshold Th1 to first spectral 
amplitude smoothing section 706 and first spectral 
phase randomizing section 709. 
[0127] FFT processing section 704 performs FFT 
processing on tfie synthesized speech signal output 
from synthesis filter 209 and post filter 210 in the 
speech decoding apparatus, outputs the spectral ampli- 
tude to mode selection switches 70S and 712. adder 
71S. and second spectral phase randomizing section 
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710, and further oulputs the spectral phase to mode 
selection switch 708. 

[0128] Mode selection switch 705 receives the 
mode information (Mode) output from mode selector 
202 in the speech decoding apparatus, and the differ- s 
ence information (Diff) output from adder 715, and 
judges whether the decoded signal in the current unit 
processing time is the speech region or the stationary 
noise region. Mode selection switch 705 connects to 
mode selection switch 707 when judges that the io 
decoded signal Is the speech region, while connecting 
to first spectral amplitude smoothing section 706 when 
judges that the decoded signal is the stationary noise 
region. 

[0129] First spectral amplitude smoothing section is 
706 receives the spectral amplitude SAi output from 
FFT processing section 704 through mode selection 
switch 705. and performs smoothing processing on a 
signal conrponent with a frequency determined by the 
input first threshold Th1 and weighted spectral ampli- 20 
tude WSAi to output to mode selection switch 707. The 
determination of the signal component with the fre- 
quency to be processed for smoothing is performed by 
determining whether the weighted spectral amplitude 
WSAi is equal to or less than the first threshold Thi . In 25 
other words, the smoothing processing of the spectral 
amplitude SAi is performed on the signal component 
with the frequency i such that WSAi is equal to or less 
than Thi. The smoothing processing reduces the dis- 
continuity in time of the spectral amplitude caused by 30 
the coding distortion. In the case where the smoothing 
processing is performed with the AR type expressed 
with the equation (1). the coefficient a can be set at 
about 0.1 when the number of FFT points is 128, and 
the unit processing time is 10ms. 35 
[0130] As mode selection switch 705. mode selec- 
tion switch 707 receives the mode information (Mode) 
output from mode selector 202 In the speech decoding 
apparatus, and the difference information (Diff) output 
from adder 71 5. and judges whether the decoded signal 40 
in the current unit processing time is the speech region 
or the stationary noise region. Mode selection switch 
707 connects to mode selection switch 705 when 
judges that the decoded signal is the speech region, 
while connecting to first spectral amplitude smoothing 4S 
section 706 when judges that the decoded signal is the 
stationary noise region. The judgment result is the 
same as that by mode selection switch 705. An output of 
mode selection switch 707 is connected to IFFT 
processing section 720. so 
[0131] Mode selection switch 708 is a switch of 
which the output is switched synchronously with mode 
selection switch 705. Mode selection switch 708 
receives the mode information (Mode) output from 
mode selector 202 in the speech decoding apparatus, ss 
and the difference information (Diff) output from adder 
71 5. and judges whether the decoded signal in the cur- 
rent unit processing time is the speech region or the sta- 



tionary noise region. Mode selection switch 708 
connects to second spectral phase randomizing section 
710 when judges that the decoded signal is the speech 
region, while connecting to first spectral phase rand- 
omizing section 709 when judges that the decoded sig- 
nal is the stationary noise region. The judgment result is 
the same as that by mode selection switch 705. In other 
words, mode selection switch 708 is connected to first 
spectral phase randomizing section 709 when mode 
selection switch 705 is connected to first spectral ampli- 
tude smoothing section 706. and mode selection switch 

708 is connected to second spectral phase randomizing 
section 710 when mode selection switch 705 is .con- 
nected to mode selection switch 707. 

[0132] Rrst spectral phase randomizing section 

709 receives the spectral pfiase SPi output from FFT 
processing section 704 through mode selection switch 
708. and performs randomizing prcx;essing on a signal 
component with a frequency determined by the input 
first threshold Thi and weighted spectral amplitude 
WSAi to output to mode selection switch 711. The 
method for determining the signal component at the fre- 
quency to be processed for randomizing is the same 
way as tfiat for determining the signal component at the 
frequency to be processed for smoothing in first spectral 
amplitude smoothing section 706. In other words, the 
randomizing processing of spectral phase SPi is per- 
formed on the signal component with the frequency i 
such that WSAi is equal to or less than Thi . 

[0133] Second spectral phase randomizing section 

710 receives the spectral phase SPI output from FFT 
processing section 704 through mode selection switch 
708, and performs randomizing processing on a signal 
component with a frequency determined by the input 
second threshold Th2i and spectral amplitude SAi to 
output to mode selection switch 711. The method for 
determining the signal conrponent at the frequency to 
be processed for randomizing is similar to that in first 
spectral phase randomizing section 709. In other words, 
the randomizing processing of spectral phase SPi is 
performed on the signal component with the frequency i 
such that SAi is equal to or less than Tli2i. 

[0134] Mode selection switch 711 operates syn- 
chronously with mode selection switch 707. As mode 
selection switch 707, mode selection switch 710 
receives the mode Information (Mode) output from 
mode selector 202 in the speech decoding apparatus, 
and the difference information (DifO output from adder 
715, and judges whether the decoded signal in the cur- 
rent unit processing time is the speech region or the sta- 
tionary noise region. Mode selection switch 711 
connects to second spectral phase randomizing section 
710 when judges that the decoded signal is the speech 
region, while connecting to first spectral phase rand- 
omizing section 709 when judges that the decoded sig- 
nal is the stationary noise region. The judgment result is 
the same as that by mode selection switch 708. An out- 
put of mode selection switch 71 1 is connected to IFFT 
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prcx:essing section 720. 

[0135] As mode selection switch 705. mode selec- 
tion switch 712 receives the mode information (Mode) 
output from mode selector 202 in the speech decoding 
apparatus, and the difference information (DifO output s 
from adder 715, and jidges whether the decoded signal 
in the current unit processing time is the speech region 
or the stationary noise region. When it is judged that the 
decoded signal is not the speech region (is the station- 
ary noise region), mode selection switch 712 is con- io 
nected to output the spectral amplitude SAi output from 
FFT processing section 704 to second spectral ampli- 
tude smoothing section 713. When it is determined that 
the decoded signal is the speech region, mode selec- 
tion switch 71 2 is disconnected, and therefore the spec- 75 
tral amplitude SAt is not output to second spectral 
amplitude smoothing section 713. 
[0136] Second spectral amplitude smoothing sec- 
tion 713 receives the spectral amplitude SAi output from 
FFT processing section 704 through mode selection so 
switch 712. and performs the smoothing processing on 
signal components at all frequency bands. The average 
spectral amplitude in the stationary noise region can be 
obtained by this smoothing processing. The smoothing 
processing is the same as that in first spectral amplitude 2S 
smoothing section 706, In addition, when mode selec- 
tion switch 712 is disconnected, the section 713 does 
not perform the processing, and a smoothed spectral 
amplitude SSAi of the stationary noise region, which is 
last processed, is output. The smoothed spectral ampli- 30 
tude SSAi processed in second spectral ampHtude 
smoothing processing section 713 is output to delay 
section 714, second threshold setting section 716, and 
mode selection switch 718. 

[0137] Delay section 714 delays the input SSAi. 35 
output from second spectral amplitude smoothing sec- 
tion 713. by a unit processing time to output to adder 

715. 

[0138] Adder 715 calculates a difference between 
the smoothed spectral amplitude SSAi of the stationary 40 
noise region In the last unit processing time and the 
spectral amplitude SAi in the current unit processing 
time to output to mode switches 705, 707, 708, 711, 
712. 718, and 719. 

[0139] Second threshold setting section 716 sets 45 
the threshold T'h2i using as a reference the smoothed 
spectral amplitude SSAi of the stationary noise region 
output from second spectral amplitude smoothing sec- 
tion 71 3 to output to second spectral phase randomizing 
section 710. so 
[0140] Random spectral phase generating section 
717 outputs a randomly generated spectral phase to 
mode selection switch 719. 

[0141] As mode selection switch 712, mode selec- 
tion switch 718 receives the mode information (Mode) ss 
output from mode selector 202 in the speech decoding ' * 
apparatus, and the difference information (Diff) output 
from adder 715, and judges whether the decoded signal 



in the current unit processing time is the speech region 
or the stationary noise region. When it is judged that the 
decoded signal is the speech region, mode selection 
switch 718 is connected to output an output from sec- 
ond spectral amplitude smoothing section 713 to IFFT 
processing section 720. When it is determined that the 
decoded signal is not the speech region (stationary 
noise region), mode selection switch 718 is discon- 
nected, and therefore the output from second spectral 
arnplitude smoothing section 713 is not output to IFFT 
processing section 720. 

[0142] Mode selection switch 719 is switched syn- 
chronously with mode selection switch 718. As mode 
selection switch 718, mode selection switch 719 
receives the mode information (Mode) output from 
mode selector 202 in the speech decoding apparatus, 
and the difference information (Diff) output from adder 
715, and judges whether the decoded signal in the cur- 
rent unit processing time is the speech region or the sta- 
tionary noise region. When it is judged that the decoded 
signal is the speech region, mode selection switch 719 
is connected to output an output from random spectral 
phase generating section 717 to IIFFT processing sec- 
tion 720. When it is judged that the decoded signal is 
not ttie speech region (is stationary noise region), mode 
selection switch 719 is disconnected, and therefore the 
output from second random spectral phase generating 
section 717 is not output to IFFT processing section 
720. 

[0143] IFFT processing section 720 receives the 
specti'al amplitude output from mode selection switch 
707. the spectral phase output from mode selection 
switch 71 1 . the spectral amplitude output from mode 
selection switch 718, and the spectral phase output 
from mode selection section 719 to perform IFFT 
processing, and outputs the processed signal. When 
mode selection switches 718 and 719 are discon- 
nected. JFFT processing section 720 transforms the 
spectral amplitude input from mode selection 707 and 
the spectral phase input from mode selection switch 
71 1 into a real part spectrum and imaginary part spec- 
trum of FFT, then performs the IFFT processing, and 
outputs the real part of the resultant as a time signal. On 
the other hand, when mode selection switches 718 and 
719 are connected. IFFT processing section 720 trans- 
forms the spectral amplitude input from mode selection 
707 and the spectral phase input from mode selection 
switch 71 1 into a first real part spectrum and first imag- 
inary part spectrum, and further transforms the spectral 
amplitude Input from mode selection 718 and the spec- 
tral phase input from mode selection switch 719 into a 
second real part spectrum and second imaginary part 
spectrum to add, and then performs the IFFT process- 
ing. In other words, assuming that a third real part is 
obtained by adding the first real part spectrum to the 
'second real part spectrum, and tiiat a third imaginary 
part is obtained by adding the first imaginary part spec- 
trum to the second imaginary part spectrum, the IFFT 
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processing is performed using the third real part spec- 
trum and third imaginary part spectrum. At the time of 
adding the above-mentioned spectra, the second real 
part ^ectrum and second imaginary part spectrum are 
attenuated by constant times or an adaptively controlled 5 
variable. For example, at the time of adding the above- 
mentioned spectra, the second real part spectrum is 
multiplied by 0.25 and then added to the first real part 
spectrum, and the second Imaginary part spectrum is 
multiplied by 0.25. and then added to the first imaginary 10 
part spectrum, thereby obtaining the third real part 
spectrum and third imaginary part spectrum. 
[0144] The postprocessing method previously 
described is next explained using FIGS. 11 and 12. 
FIG. 11 is a flowchart illustrating specific processing of is 
the postprocessing method in this embodiment. 
[0145] Rrst, in ST1101. FFT logarithmic spectral 
amplitude (WSAi) of a perceptual weighted input signal 
(decoded speech signal) is calculated. 
[0146] Next, in ST1102, the first threshold Thi is 20 
calculated. Thi is obtained by adding a constant k1 to 
the average of WSAi. The value of k1 is determined 
empirically, and, for example, about 0.4 In the common 
logarithmic region. Assuming that the number of FFT 
points is N, and that the FFT spectral amplitude is WSAi 2S 
(i=1.2 N). the average of WSAi is obtained by calcu- 
lating the average value of an N/2 number of WSAi 
because WSAi is symmetry with respect to the bound- 
ary of i=N/2 and i=N/2-i'1 . 

[0147] Next, in ST1103, FFT logarithmic spectral 30 
amplitude (SAi) and FFT spectral phase (SPi) of an 
input signal (decoded speech signal) that is not percep- 
tual weighted is calculated. 

[0148] Next, in ST1104. the spectral difference 
(Diff) is calculated. The spectral difference is the total as 
residual spectra each obtained by subtracting the aver- 
age FFT logarithmic spectral amplitude (SSAi) in the 
region previously judged as the stationary noise region 
from the current FFT logarithmic spectral amplitude 
(SAi). The spectra difference Diff obtained in this step is 40 
a parameter to judge whether or not the current power 
is larger than the average power of the stationary noise 
region. When the current power is larger than the aver- 
age power of the stationary noise region, the region has 
a signal different from a stationary noise component. 45 
and therefore the region is judged to be not the station- 
ary noise region. 

[0149] Next, in ST1105, the counter is checked. 
The counter is indicative of the number of times the 
decoded signal is judged as the stationary noise region so 
previously. In the case where the number of the counter 
is more than a predetermined value, in other words, 
when it is judged that the decoded signal is the station- 
ary noise region previously with some extent of stability, 
the processing proceeds to ST1 107. In the other case, ss 
in other words, when it ^ little judged that the decoded 
signal is the stationary noise region previously, the 
processing proceeds to ST1106. The difference 



between ST1 106 and ST1 107 is that the spectral differ- 
ence (Diff) is used or not as a judgment criterion. The 
spectral difference (Diff) is calculated using the average 
FFT logarithmic spectral amplitude (SSAi) in the region 
previously judged as the stationary noise region. To 
obtain such an average FFT logarithmic spectral ampli- 
tude (SSAi). it is necessary to use a previous stationary 
noise region with a sufficient time length of some extent, 
and therefore ST1105 is provided. When there is no 
previous stationary noise region with a sufficient time 
length, since it is considered that the average FFT loga- 
rithmic spectral amplitude (SSAi) is not averaged suffi- 
ciently, the processing is intended to proceed to ST1 106 
in which the spectral difference (Diff) is not used. The 
initial value of the counter is 0. 

[0150] Next, in ST1106 or ST1107. it is judged 
whether or not the decoded signal is the stationary 
noise region. In ST1106. it is judged that the decoded 
signal is the stationary noise region in the case where 
an excitation mode that is already determined in the 
speech decoding apparatus is the stationary noise 
region mode. In ST1107. it is judged that the decoded 
signal is the stationary noise region in the case where 
an excitation mode that is already determined in the 
speech decoding apparatus is the stationary noise 
region mode, and the spectral difference (Diff) calcu- 
lated in ST1104 is equal to or less than the threshold 
K3. In ST1106 or ST1107, the processing proceeds to 
ST1 108 when it is judged that the decoded signal is the 
stationary noise region, while the processing proceeds 
to ST1 113 when it is judged that the decoded signal is 
not the stationary noise region, in other words, that the 
decoded signal is the speech region. 
[01 51 ] When it is judged that the decoded signal 
the stationary noise region, the smoothing processing is 
next performed in ST1108 to obtain the average FFT 
logarithm spectrum (SSAi) of the stationary noise 
region. In the equation in ST1108. p is a constant indic- 
ative of an intensity of smoothing in the range of 0.0 to 
0.1 , p may be at>out 0. 1 when the number of FFT points 
is 128. and a unit processing time is 10ms (80 points in 
8kHz sampling). The smoothing processing is per- 
formed on all logarithmic spectral amplitudes (SAi, 
i=1.....N, N is the number of FFT points). 
[0152] Next, in ST1109. the smoothing processing 
of FFT logarithmic spectral amplitude is performed to 
perform smoothing on the spectral amplitude difference 
of the stationary noise region. The smoothing process- 
ing is the same as that in ST1108. However, the 
smoothing processing in ST1 109 is not performed on all 
logarithmic spectral amplitudes (SAi"). but performed on 
a signal component with a frequency i such that the per- 
ceptual weired logarithmic spectral amplitude (WSAi) 
is equal to or less than the threshold Thi . y in the equa- 
tion in ST1109 is the same as p in ST1108, and may 
have the same value as p . Partially smoothed logarith- 
mic spectral amplitude SSA2i is obtained in ST1 109. 
[01 53] Next, in ST1 1 10. the randomizing process- 
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ing is performed on the FFT spectral phase. The rand- 
omizing processing is performed on a signal component 
with a selected frequency in the same way as in the 
smoothing processing in ST1 109. In other words, as in 
ST1109, the randomizing processing is performed on § 
the signal component with the frequency i such that the 
perceptual weighted logarithmic spectral amplitude 
(WSAi) is equal to or less than the threshold Thi . At this 
point, it may be possible to set Th1 at the same value as 
in ST1 109, and also possible to set Thi at a different io 
value adjusted to obtain higher subjective quality, in 
addition, random (i) in ST1110 is a numerical value 
ranging from -Zn to +2n generated randomly. To gener- 
ate random (i), it may be possible to generate a random 
number newly every time. To save a computation 15 
amount, it may be also possible to hold pre-generated 
random numbers in a table to use while circulating the 
contents of the table for each unit processing time. 
When the table is used, two cases are considered that 
the contents of the table is used without modification. 20 
and that the contents of the table is added to the FFT 
spectral phase to use. 

[01 54] Next, in ST1 1 1 1 , a complex FFT spectrum is 
generated from the FFT logarithmic spectral amplitude 
and FFT spectral phase. The real part is obtained by 2S 
returning the FFT logarithmic spectral amplitude SSA2i 
from the logarithmic region to the linear region, and then 
multiplying by a cosine of a spectral phase RSP2i. The 
imaginary part is obtained by returning the FFT logarith- 
mic spectral amplitude SSA2i from the logarithmic 30 
region to the linear region, and then multiplying by a 
sine of the spectral phase RSP2i. 
[0155] Next, in ST1 112. the number of the counter 
indicative of the region judged as the stationary noise 
region is incremented by 1 . 35 
[0156] On the other hand, when it is judged that the 
decoded signal is the speech region (not the stationary 
noise region) in ST1 1 06 or ST1 1 07. next in ST1 1 1 3, the 
FFT logarithmic spectral amplitude SAi is copied as the 
smoothed logarithmic spectrum SSA2i. In other words. 40 
the smoothing processing of the logarithmic spectral 
amplitude is not performed. 

[0157] Next, in ST1114, the randomizing process- 
ing of the FFT spectral phase is performed. The rand- 
omizing processing is performed on a signal component 45 
with a selected frequency as in ST1110. However, the 
threshold for use in selecting the frequency is not Thi, 
but a value obtained by adding a constant k4 to SSAi 
previously obtained in ST1 1 08. This threshold equals to 
the second threshold Th2i in FIG.6. In other words, the so 
randomizing of the spectral phase is performed on a 
signal component with a frequency such that the spec- 
tral amplitude is smaller than the average spectral 
amplitude of the stationary noise region. 
[01 58] Next, in ST1 1 1 5. a complex FFT spectrum is 55 
generated from the FFT logarithmic spectral amplitude 
and FFT spectral phase. The real part is obtained by 
adding the value obtained by returning the FFT logarith- 



mic spectral amplitude SSA2i from the logarithmic 
region to the linear region, and then multiplying by the 
cosine of the spectral phase RSP2i. and a value 
obtained by multiplying a value obtained by returning 
the FFT logarithmic spectral amplitude SSAi from the 
logarithmic region to the linear region by a cosine of a 
spectral phase random2(l). and further multiplying the 
resultant by the constant kS. The imaginary part is 
obtained by adding the value obtained by returning the 
FFT logarithmic spectral amplitude SSA2i from the log- 
arithmic region to the linear region, and then multiplying 
by the sine of the spectral phase RSP2i, and a value 
obtained by multiplying a value obtained by returning 
the FF=T logarithmic spectral amplitude SSAi from the 
logarithmic region to the linear region by a sine of the 
spectral phase random2(l). and further multiplying the 
resultant by the constant k5. The constant kS is in the 
range of 0.0 to 1 .0, and specifically set at about 0.25. In 
addition. k5 may be an adaptively controlled variable. It 
is possible to improve the subjective qualities of the 
kDackground stationary noise in the speech region by 
multiplexing the average stationary noise multiplied by 
k The random2(i) is the same random number as ran- 
dom(i). 

[01 59] Next, in ST1 116. IFFT is performed on com- 
plex FFT spectrum (Re(S2)i. lm(S2)i) generated in 
ST1111 or ST1115 to obtain a complex (Re(s2)i. 
Im(s2)i). 

[0160] Finally, in ST1117. the real part Re(s2)i of 
the conplex obtained by the IFFT is output. 
[0161] According to the multimode speech coding 
apparatus of the present invention, since the coding 
mode of the second coding section is determined using 
the coded result in the first coding section, it is possible 
to provide the second coding section with the multimode 
without adding any new information indicative of a 
mode, and thereby to improve the coding performance. 
[0162] In this constitution, the mode switching sec- 
tion switches the mode of the second coding section 
that encodes the excitation vector using the quantized 
parameter indicative of speech spectral characteristic, 
whereby In the speech coding apparatus that encodes 
parameters indicative of spectral characteristics and 
parameters Indicative of the excitation vector independ- 
ently of each other, it is possible to provide the coding of 
the excitation vector with the multimode without increas- 
ing new transmission information, and therefore to 
improve the coding performance. 
[01 63] In this case, since it is possible to detect the 
stationary noise segment using dynamic characteristics 
for the mode selection, the excitation vector coding pro- 
vided with the multimode improves the coding perform- 
ance for the stationary noise segment. 
[0164] Further, in this case, the mode switching 
section switches the mode of the processing section 
that encodes the excitation vector using quantized LSP 
parameters, and therefore it is possible to apply the 
present invention simply to a CELP system that uses 
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the LSP parameters as parameters indicative of spec* C 
tral characteristics. Furthermore, sirtce tiie LSP param- 
eters that are parameters in a frequency region are i. 
used, it is possible to perform the judgment of the sta- 
tionarity of the spectrum, and tiierefore to inprove the 5 
coding performance for stationary noises. 
[0165] Moreover, in this case, the mode switching 
section judges the stationarity of the quantized LSP 
using tiie previous and current quantized LSP parame- 
ters. Judges the voiced characteristics using the current 10 
quantized LSP, and based on the judgment results, per- 
forms the mode selection of the processing section that 
encodes the excitation vector, whereby it is possible to 
perform the coding of the excitation vector while switch- 
ing k)etween the stationary noise segment, unvoiced is 
speech segment and voiced speech segment, and 
therefore to improve the coding performance by prepar- 
ing tiie coding mode of the excitation vector correspond- 
ing to each segment. 

[0166] in the speech decoding apparatus of the so 
present invention, since it is possible to detect the case 2. 
that the power of a decoded signal is suddenly 
increased, it is possible to cope with the case that a 
detection error is caused by the above-mentioned 
processing section that detects the speech region. 25 
[0167] Further, in the speech decoding apparatus 
of the present invention, since it is possible to detect the 
stationary noise segment using dynamic characteris- 
tics, tiie excitation vector coding provided with tiie multi- 
oKXIe the excitation vector coding provided with the 30 3. 
multimode improves the coding performance for the sta- 
tionary noise segment. 

[0168] As described above, according to the 
present inventbn, since the mode selection of speech 
coding and/or decoding postprocessing is performed 35 
using the static and dynamic characteristics in the quan- 
tized data of parameters indicative of spectral charac- 4. 
teristics, it is possible t provide the speech coding with 
the multimode without newly transmitting the mode 
information. In particular, since rt is possible to perform 40 
the judgment of the speech region/non-speech region in 
addition to the judgment of the voiced region/unvoiced 5. 
region. It is possible to provide the speech coding appa- 
ratus and speech decoding apparatus enabling the 
increased improvement of the coding performance by 45 
the multimode. 

[0169] This application is based on the Japanese 
Patent Applications No. HEI1 0-236147 iWecH on August 6. 
21. 1988, and No.HEl 10-266883 filed on September21. 
1988. entire content of which is expressly incorporated so 
by reference herein. 

Industrial Applicability 

[0170] The present invention is effectively applica- ss 
ble to a communication terminal apparatis and base 
station apparatus in a digital radio communication sys- 
tem. • » - , 7. 



A multimode speech coding apparatus comprising: 

first coding means for coding at least one type 
of parameter indicative of vocal tract informa- 
tion contained in a speech signal; 
second coding means for being capable of cod- 
ing said at least one type of parameter indica- 
tive of vocal tract information witii a plurality of 
modes; 

mode switching means for switching a coding 
mode of said second coding means based on a 
dynamic characteristic of a specific parameter 
coded in said first coding means; and 
synthesis means for syrrthesizing an input 
speech signal using a plurality of types of 
parameter information coded in said first cod- 
ing means and said second coding means. 

The multimode speech coding apparatus according 
to daim 1 , wherein said second coding means com- 
prises coding means for being capable of coding an 
excitation vector with a plurality of coding modes, 
and said mode switching means switches the cod- 
ing mode of said second coding means using a 
quantized parameter indicative of a spectral char- 
acteristic of a speech. 

The multimode speech coding apparatus according 
to claim 2, wherein said mode switching means 
switches the coding mode of said second coding 
means using a static characteristic and a dynamic 
characteristic of the quantized parameter indicative 
of the spectral characteristic of the speech. 

The multimode speech coding apparatus according 
to claim 2. wherein said mode switching means 
switches the coding mode of said second coding 
means using a quantized LSP parameter. 

The multimode speech coding apparatus according 
to claim 4. wherein said mode switching means 
switches the coding mode of said second coding 
means using a static characteristic and a dynamic 
characteristic of the quantized LSP parameter. 

The multimode speech coding apparatus according 
to claim 4. wherein said mode switching means 
comprises means for judging stationarity of the 
quantized LSP parameter using a previous quan- 
tized LSP parameter and a current quantized LSP 
parameter, and means for judging a voiced charac- 
teristic using the current quantized LSP parameter, 
and based on judged results, switches the coding 
mode of said second coding means. 

A multimode speech decoding apparatus compris- • 
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ing: 

first decoding means for decoding at least one 
type of parameter indicative of vocal tract infor- 
mation contained in a speech signal: s 
second decoding means for being capable of 
decoding said at least one type of parameter 
indicative of vocal tract information with a plu- 
rality of decoding modes; 

mode switching means for switching a decod- io 
ing mode of said second decoding means 
based on a dynamic characteristic of a specific 
parameter decoded in said first decoding 
means; and 

synthesis means for decoding the speech sig- is 
nal using a plurality of types of parameter infor- 
mation decoded in said first decoding means 
and said second decoding means. 

8. The multimode speech decoding apparatus accord- 20 
ing to claim 7. wherein said second decoding 
means comprises decoding means for being capa* 

ble of decoding an excitation vector with a plurality 
of decoding modes, and said mode switching 
means switches the decoding mode of said second 2S 
decoding means using a quantized parameter 
indicative of a spectral characteristic of a speech. 

9. The multimode speech decoding apparatus accord- 
ing to claim 8, wherein said mode switching means so 
switches the decoding mode of said second decod- 
ing means using a static characteristic and a 
dynamic characteristic of the quantized parameter 
indicative of the spectral characteristic of the 
speech. 35 

10. The multimode speech decoding apparatus accord- 
ing to claim 8, wherein said mode switching means 
switches the decoding mode of said second decod- 
ing means using a quantized LSP parameter. 40 

1 1 . The multimode speech decoding apparatus accord- 
ing to claim 10, wherein said mode switching 
means switches the decoding mode of said second 
decoding means using a static characteristic arKi a 4s 
dynamic characteristic of the quantized LSP 
parameter. 

1 2. The multimode speech decoding apparatus accord- 
ing to claim 10. wherein said mode switching so 
means comprises means for judging stationarity of 
the quantized LSP parameter using a previous 
quantized LSP parameter and a current quantized 
LSP parameter, and means for judging a voiced 
characteristic using the current quantized LSP ss 
parameter, and based on judged results, switches 

the decoding mode of said second decoding 
means. 



13. The multimode speech decoding apparatus accord- 
ing to claim 7. wherein said apparatus switches 
postprocessing for a decoded signal based on 
judged results. 

14. A quantized-LSP -parameter dynamic characteristic 
extractor comprising: 

means for calculating an evolution of a quan- 
tized LSP parameter between frames; 
means for calculating an average quantized 
LSP parameter in a frame in which the quan- 
tized LSP parameter is stationary; and 
means for calculating an evolution between 
said average quantized LSP parameter and a 
current quantized LSP parameter. 

15. A quantized-LSP-parameter static characteristic 
extractor comprising: 

means for calculating linear prediction residual 
power using a quantized LSP parameter; and 
means for calculating a region between neigh- 
boring orders of the quantized LSP parameter. 

1 6. A multimode postprocessing apparatus comprising: 

judgment means for judging whether or not a 
region is a speech region using a decoded LSP 
parameter; 

FFT processing means for performing fast Fou- 
rier transform processing on a signal; 
spectral phase randomizing means for rand- 
omizing a spectral phase obtained by said fast 
Fourier transform processing con-esponding to 
a result judged by said judgment means; 
spectral amplitude smoothing means for per- 
forming smoothing on a spectral amplitude 
obtained by said fast Fourier transform 
•processing corresponding to said result; and 
I FFT processing means for performing inverse 
fast Fourier transform on the spectral phase 
randomized by said spectral phase randomiz- 
ing means and the spectral amplitude 
smoothed by said spectreU amplitude smooth- 
ing means. 

17. The multimode postprocessing apparatus accord- 
ing to claim 16. wherein said device determines a 
frequency of the spectral phase to be randomized 
using an average spectral amplitude of a previous 
non-speech region in a speech region, and deter- 
mines a frequency of the spectral phase to be ran- 
domized and the spectral amplitude to be 
smoothed using an average spectral amplitude with 
all frequencies in a perceptual weighted domain in 
a non-speech region. 
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18. The multiincxie postprocessing apparatus accord- 
ing to claim 16, wherein said device multiplexes in a 
speech region a noise generated using average 
spectral amplitude in a previous non-speech 
region. 5 

19. A speech signal transmission apparatus having a 
speech input apparatus that converts a speech sig- 
nal into an electric signal, an A/D converter that 
converts a signal output from the speech input io 
apparatus into a digital signal, a multimode speech 
coding apparatus that performs coding on the dig- 
ital signal output from the A/D converter, an RF 
modulator that performs modulation processing on 
coded information output from the multimode 75 
speech coding apparatus, and a transmission 
antenna that transmits a radio signal output from 

the RF modulator, said multimode speech coding 
apparatus comprising: 

20 

first coding means for coding at least one type 
of parameter indicative of vocal tract informa- 
tion contained in a speech signal; 
second coding means for being capable of cod- 
ing said at least one type of parameter indica- 2s 
tive of vocal tract information with a plurality of 
modes: 

mode switching means for switching a coding 
mode of said second coding means based on a 
dynamic characteristic of a specific parameter 30 
coded In said first coding means; and 
synthesis means for synthesizing an input 
speech signal using a plurality of types of 
parameter information coded in said first cod- 
ing means and said second coding means. 35 

20. A speech signal reception apparatus having a 
reception antenna that receives a radio signal, an 
RF demodulator that performs demodulation 
processing on the radio signal received at the 40 
reception antenna, a multimode speech decoding 
apparatus that performs decoding on information 
obtained by tiie RF dennodulator. a D/A converter 
that converts a digital speech signal decoded in tine 
multimode speech decoding apparatus into an ana- 45 
log signal, and a speech output apparatus that con- 
verts an electric signal output from tiie D/A 
converter into a speech signal, said multimode 
speech decoding apparatus comprising: 

so 

first decoding means for decoding at least one 
type of parameter indicative of vocal tract infor- 
mation contained in a speech signal; 
second decoding means for being capable of 
decoding said at least one type of parameter ss 
indicative of vocal tract information with a plu- 
rality of decoding modes; 
mode suvitching mear^ for switching a decod- 



ing mode of said second decoding means 
based on a dynamic characteristic of a specific 
parameter decoded in said first decoding 
means; and 

synthesis means for decoding the speech sig- 
nal using a plurality of types of parameter infor- 
mation decoded in said first decoding means 
and said second decoding means. 

21- A computer readable recording medium with a 
computer executable program recorded tiierein, the 
program comprising the procedures of: 

judging stationarity of a quantized LSP param- 
eter using a previous quantized LSP parameter 
and a current quantized LSP parameter; 
judging a voiced characteristic using the cur- 
rent quantized LSP parameter; and 
switching a mode of a procedure of coding an 
excitation vector, based on judged results. 

22. A conputer readable recording medium with a 
computer executable program recorded therein, tiie 
program comprising tiie procedures of: 

judging stationarity of a quantized LSP param- 
eter using a previous quantized LSP parameter 
and a current quantized LSP parameter; 
judging a voiced characteristic using the cur- 
rent quantized LSP parameter; 
switching a mode of a procedure of decoding 
an excitation vector, based on judged results; 
and 

switching a procedure of performing post- 
processing on a decoded signal, based on the 
judged results. 

23. A multimode speech coding method for performing 
mode switching of a mode for coding an excitation 
vector, using a static characteristic and a dynamic 
characteristic of a quantized parameter Indicative of 
a spectral characteristic of a speech. 

24. A multimode speech decoding method for perform- 
ing mode switching of a mode for decoding an exci- 
tation vector, using a static characteristic and a 
dynamic characteristic of a quantized parameter 
indicative of a spectral characteristic of a speech. 

25- The multimode speech decoding method according 
to daim 24. said metiiod conprising tiie steps of: 

performing postprocessing on a decoded sig- 
nal; and 

switching the step of performing postprocess- 
ing, based on mode information. 

26- A quantized-LSP-parameter dynamic characteristic 
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extracting method comprising the steps of: 

calculating an evolution of a quantized LSP 
parameter between frames: 
calculating an average quantized LSP parame- 5 
ter in a frame in which the quantized LSP 
parameter is stationary: and 
calculating an evolution between said average 
quantized LSP parameter and a current quan- 
tized LSP parameters. io 

27. A quantized-LSP-parameter static characteristic 
extracting method comprising the steps: 

calculating linear prediction residual power is 
using a quantized LSP parameter; and 
calculating a region between neighboring 
orders of the quantized LSP parameter. 

28. A multimode postprocessing method comprising: 20 

the judgment step of judging whether or not a 
region is a speech region using a decoded LSP 
parameter; 

the FFT processing step of performing fast 2S 
Fourier transform processing on a signal; 
the spectral phase randomizing step of rand- 
omizing a specti-al phase obtained by said fast 
Fourier transform processing corresponding to 
a result determined by said judgment step; 30 
the spectral amplitude smoothing step of per- 
forming smoothing on a spectral anplrtude 
obtained by said fast Fourier transform 
processing corresponding to said result; and 
the I FFT processing step of performing inverse 3S 
fast Fourier transform on the spectral phase 
randomized by said spectral phase randomiz- 
ing step and the spectral amplitude smoothed 
by said spectral amplitude smoothing step. 

40 



45 



so 
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